Potato Red Price Forecast

1. Introduction

Forecasting prices of staple foods such as cabbage and potatoes is important because of its direct impact on a country’s economy and society. Nepal, in this case, is a country with a large agricultural sector that contributes to a large part of the country’s GDP. The price of staple foods affects not only farmers’ incomes, but also the economy of the country. These Nepalese staple food prices are also important to forecast as they indicate consumer purchasing power and the general rate of inflation in the country. By predicting these prices one can promote informed decisions, reduce risks and promote economic stability. Forecasting prices for staple foods is and will always be important.

Main goal: accurately forecast the prices of key vegetables in Nepal over a specific time horizon.

Dataset: The data set consists of vegetables in Nepal, where each vegetable represents a time series. The data is of a daily-nature, but it has been transformed into monthly data for the sake of this analysis and forecasting project.

1.1 Data Preparation

Loading the data sets: fruits and vegetables prices in Nepal, as well as the Consumer Price Index (CPI) for Nepal. The fruits and vegetables table contains the average price for vegetables and fruits on a given day. For the time series analysis this will be aggregated to the month level by taking the average of the average daily prices in a particular month.

The generated tsibble looks like this.

# Loading Datasets.
veggies <- read_csv("../data_sources/kalimati_tarkari_dataset.csv", 
                    col_types = cols(Date = col_date(format = "%Y-%m-%d")))

suppressWarnings({
  cpi <- read_excel("../data_sources/Nepal CPI Trading Economics.xlsx", 
                    col_types = c("date", "numeric"))
})

cpi <- cpi |>
  filter(Date >= '2013-08-01 00:00:00')

# tidy up cpi
base_index <- cpi$CPI[1]
cpi <- cpi |>
  mutate(Date = yearmonth(cpi$Date),
         CPI = CPI/base_index) |>
  select(c(Date, CPI))

suppressMessages({
  veggies <- veggies %>%
    filter(Unit == "Kg") %>%
    filter(Commodity %in% c("Tomato Big(Nepali)", "Potato Red",
                            "Onion Dry (Indian)", "Carrot(Local)",
                            "Cabbage(Local)")) %>%
    mutate(Date = yearmonth(Date)) %>%
    select(Commodity, Date, Average) %>%
    group_by(Date, Commodity) %>%
    summarize(Average = mean(Average)) %>%
    ungroup()
})

veggies_ts <- veggies |>
  as_tsibble(key = Commodity,
             index = Date)

head(veggies_ts, 5)
## # A tsibble: 5 x 3 [1M]
## # Key:       Commodity [1]
##       Date Commodity      Average
##      <mth> <chr>            <dbl>
## 1 2013 Jun Cabbage(Local)    11.1
## 2 2013 Jul Cabbage(Local)    16.2
## 3 2013 Aug Cabbage(Local)    35.7
## 4 2013 Sep Cabbage(Local)    42.4
## 5 2013 Oct Cabbage(Local)    32.2

2. Time Series Graphics

This plot shows the average monthly price in Nepalese Rupees of 5 vegetables in Nepal.

veggies_ts |>
  autoplot(Average) +
  labs(y="Nepalese Rupees",
       title="Average price per month of various vegetables in Nepal") +
  theme_bw()

The following plot shows the average monthly price of Potato Red in Nepal. The graph shows some spikes around mid year accross all of the years of the observed data, and also shows a high increment in price in 2020.

potato <- veggies_ts |>
  filter(Commodity == "Potato Red")

potato |>
  autoplot(Average) +
  labs(y="Nepalese Rupee",
       title="Average Price of Potato Red in Nepal") +
  theme_bw()

Moreover, the drastic increase in price in 2020 can be seen in the following plot. 2020 is the year with the highes price of Potato Red in the second semester of the year. In this plot we can also observe how, on average, the second semester of the year sees an increase in price of the potato.

potato |>
  gg_season(Average) +
  labs(y="Nepalese Rupee",
       title="Average Price of Potato Red in Nepal") +
  theme_bw()

This plot shows the average price of Potato Red in Nepal per month, as we can observe, the month with the highest price, on average, is October. This higher price coincide with the peak season in Nepal October-November. In this months Nepal experience a surge in tourism due to trekking activities but also coincides with the biggest festival in the Country, Dashin, which lasts for 15 days in September and October. The big spike in prices in 2020 can be attributed to a slump in production as mentioned by https://myrepublica.nagariknetwork.com/news/escalating-prices-of-potato-onion-and-tomato-make-kitchen-expenses-dearer/.

potato |>
  gg_subseries(Average) +
  labs(y="Nepalese Rupee",
       title="Average Price of Potato Red in Nepal") +
  theme(axis.text.x = element_blank(),  # Hide x-axis text
        axis.ticks.x = element_blank())

The price seems to have some seasonal component but also some sort of dependence on the previous months’ prices. The following plot shows the autocorrelation plot of the Potatoes price in Nepal. As seen by the graph the price is highly correlated with the previous month prices and the correlation continues for 3 periods, then decreases.

potato |>
  ACF(Average, lag_max = 15) |> 
  autoplot() +
  labs(title="Autocorrelation plot") +
  theme_bw()

3. Time series decomposition

It’s important to note that economic series tend to increase overtime, specially prices. Therefore, an adjustment is necessary to compare the prices across time. In this analysis the CPI of Nepal will be used as a deflator, the data was obtained from INSERT REFERENCE and the base for comparison is August 2013.

potato_constant_prices <- potato |>
  inner_join(cpi, by = join_by(Date)) |>
  mutate(Average_constant_prices = Average / CPI) |>
  select(Date, Commodity, Average_constant_prices)
head(potato_constant_prices, 5)
## # A tsibble: 5 x 3 [1M]
## # Key:       Commodity [1]
##       Date Commodity  Average_constant_prices
##      <mth> <chr>                        <dbl>
## 1 2013 Aug Potato Red                    28.1
## 2 2013 Sep Potato Red                    30.3
## 3 2013 Oct Potato Red                    36.6
## 4 2013 Nov Potato Red                    38.9
## 5 2013 Dec Potato Red                    30.8

Visually a representation of the difference between the average price per month and the average price per month with constant prices. As seen by the plot the cpi deflator reduces the distance of the spikes with the lowest points in the graphics, reducing the variability.

p1 <- potato_constant_prices |>
  autoplot(Average_constant_prices) +
  labs(y="Nepalese Rupees",
       title="Potato 2013 prices") +
  theme_bw()

p2 <- potato |>
  autoplot(Average) +
  labs(y="Nepalese Rupee",
       title="Average Price of Potato") +
  theme_bw()

grid.arrange(p1, p2, ncol = 2)

In the following can be found the difference between the different transformations. The first plot is just the average price divided by the deflator, the second plot is on top of that transformation a box cox transformation and the last plot is on top of division by deflator a log transformation. The Box Cox transformation on top of the constant prices seems to be the best transformation as it somewhat reduces the spikes.

lambda_guerrero_constant_prices <- potato_constant_prices |>
  features(Average_constant_prices, features = guerrero) |>
  pull(lambda_guerrero)

# Inflation only
p1 <- potato_constant_prices |>
  autoplot(Average_constant_prices) +
  labs(y="Constant Prices",
       title="Average Price of Potato Red, 2013 prices in Nepal") +
  theme_bw()

# Box Cox
p2 <- potato_constant_prices |>
  autoplot(box_cox(Average_constant_prices, lambda_guerrero_constant_prices)) +
  labs(y="Box Cox",
       title="Box Cox Transformation of Potato Red, 2013 prices in Nepal") +
  theme_bw()
  
p3 <- potato_constant_prices |>
  autoplot(log(Average_constant_prices)) +
  labs(y="Log",
       title="Average Log Price of Potato Red, 2013 prices in Nepal") +
  theme_bw()

grid.arrange(p1, p2, p3, ncol = 1)

The following plot shows in gray the box cox transformation of the average price of Potatoes in Nepal and the STL decomposition of that series, there is no clear trend line upwards or downwards, it looks more as a cyclical pattern, at around midyear of the even years it reaches it’s highest point in the cycle and around midyear of the odd years it reaches the lowest point.

stl_dcmp <- potato_constant_prices |>
  model(stl = STL(box_cox(Average_constant_prices, lambda_guerrero_constant_prices)))

potato_constant_prices %>%
  autoplot(box_cox(Average_constant_prices, lambda_guerrero_constant_prices), color='gray') +
  autolayer(components(stl_dcmp), trend, color='red') +
  ylab("Box Cox Transformation of Average Price") +
  ggtitle("STL Decomposition of Average Price of Potato Red in Nepal") +
  theme_bw()

The STL decomposition uses the default window for seasonal pattern of 13 months, making the seasonal pattern the same accross the entire time series, as seen in the season_year plot, every year follows the same pattern around september and october the price increases and at the begining of the year is at the lowest point.

components(stl_dcmp) %>% autoplot()

The following image shows the X-13 Arima decomposition, the main difference is the trend component and the residuals, the Arima model appears to capture more variability in the trend component and the residuals appears to be only withe noise.

seats_dcmp <- potato_constant_prices |>
  model(X_13ARIMA_SEATS(box_cox(Average_constant_prices, lambda_guerrero_constant_prices)))

components(seats_dcmp) |>
  autoplot()

4. The forecaster’s toolbox

For forecasting we will create 4 simple models, the seasonal naive, naive, drift and mean. For the mean model, the future values are the mean of the historical data. In the naive model, the forecast is equal to the last value of the time series. Drift is a random walk, last value plus the average change. Finally, the Seasonal Naive, the forecast equals the last value from the same season. We create a mable with the 4 basic models.

four_basic_models <- potato_constant_prices|>
  model(
    Seasonal_naive = SNAIVE(Average_constant_prices),
    Naive = NAIVE(Average_constant_prices),
    Drift = RW(Average_constant_prices ~ drift()),
    Mean = MEAN(Average_constant_prices))
    
four_basic_models
## # A mable: 1 x 5
## # Key:     Commodity [1]
##   Commodity  Seasonal_naive   Naive         Drift    Mean
##   <chr>             <model> <model>       <model> <model>
## 1 Potato Red       <SNAIVE> <NAIVE> <RW w/ drift>  <MEAN>

In the following plot we can see the performance of the models in a 12 month forecast period. It’s evident from the plot that none of the models perform particularity good. There is a clear seasonality in the data, at the second semester of the year the prices increase, the seasonal naive is able to capture that. However, there is no guarantee that the unfortunate slump in production of 2020 will repeat for 2021 towards 2022. It’s also interesting to note that the drift method that consideres the trend is almost the same as the naive, the trend is negative and close to 0.

four_basic_models |>
  forecast(h = "12 months") |>
  autoplot(potato_constant_prices, level = NULL) +
  ggtitle("12 Months forecast for Average Potato Price, 2013 prices.") +
  ylab("Nepalese Rupees") +
  guides(colour = guide_legend(title = "Forecast")) +
  theme_bw()

The following four plots show the model forecasting vs the actual data. It’s evident that the worst model, the one with the largest residuals, is the mean. It does not follow any pattern in the data it just takes the sample mean. The naive and drift models follows the noise of the data closely but with a lag of one period. While the seasonal naive is able to somewhat capture the seasonality of the data but has large errors in 2015, 2018 and 2020 due to changes in the seasonality from one year to the other.

models <- c("Seasonal_naive", "Naive", "Drift", "Mean")

plots <- map(models, ~ four_basic_models |>
                     select(Commodity, all_of(.x)) |>
                     augment() |>
                     ggplot(aes(x = Date)) +
                     geom_line(aes(y = Average_constant_prices, colour = "Data")) +
                     geom_line(aes(y = .fitted, colour = "Fitted")) +
                     ggtitle(sprintf("Data and %s model fitted", .x)) +
                     ylab("Nepalese Rupees"))
                    

# Suppress warning messages while printing each plot
walk(plots, ~suppressWarnings(print(.)))

The following four plots shows the residuals of the four basic models. As seen by the plots not a single model reduces the error to white noise. All of the models still exhibit autocorrelation in the errors as seen by the ACF plots, the errors appear to be non-normally distributed, except the seasonal naive model. The conclusion is that these models are not appropriate to model or forecast the time series.

# This chunk could be replaced with a loop if R had actual proper loops instead of weird ones that doesn't weird behavior.

four_basic_models |>
  select(Commodity, Seasonal_naive) |>
  gg_tsresiduals() +
  ggtitle("residuals of Seasonal_naive")

four_basic_models %>%
  select(Commodity, Naive) %>%
  gg_tsresiduals()+
  ggtitle("residuals of Naive")

four_basic_models %>%
  select(Commodity, Drift) %>%
  gg_tsresiduals()+
  ggtitle("residuals of Drift")

four_basic_models %>%
  select(Commodity, Mean) %>%
  gg_tsresiduals()+
  ggtitle("residuals of Mean")

For testing the accuracy of the 4 simple models and compare them, a recursive window method is used, the recursive window will be 18 months, increasing by 1 month. The first table shows the accuracy of the model when forecasting a 12 period window, 12 months and the second table when forecasting only 1 month. The results shows the same conclusion that we reached above by looking at the fitted model vs the data. The drift and naive models are quite similar in results and they are good at forecasting small windows since this two models follows the noise of the data. However, when forecasting longer periods such as 12 months they do not perform as good. On the other hand, the seasonal naive model performs worse than the naive and drift for forecasting small periods of time but perform much better when forecasting longer periods such as 12 months. However, from the residuals we concluded that neither of these methods produces residuals that resembles white noise.

fit_cv <- potato_constant_prices |>
  stretch_tsibble(.init = 18, .step = 1) |>
  model(
    Seasonal_naive = SNAIVE(Average_constant_prices),
    Naive = NAIVE(Average_constant_prices),
    Drift = RW(Average_constant_prices ~ drift()),
    Mean = MEAN(Average_constant_prices))

fc_cv <- fit_cv %>%
  forecast(h=12)

suppressMessages({fc_cv %>%
  accuracy(potato_constant_prices) %>%
  select(.model, .type, RMSE, MAE, MAPE, MASE, RMSSE)})
## # A tibble: 4 × 7
##   .model         .type  RMSE   MAE  MAPE  MASE RMSSE
##   <chr>          <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Drift          Test  13.3  10.2   38.4 1.43  1.47 
## 2 Mean           Test  10.4   7.64  28.6 1.07  1.15 
## 3 Naive          Test  12.5   9.40  35.4 1.32  1.39 
## 4 Seasonal_naive Test   8.71  6.80  24.4 0.955 0.965
fc_cv <- fit_cv %>%
  forecast(h=1)

suppressMessages({fc_cv %>%
  accuracy(potato_constant_prices) %>%
  select(.model, .type, RMSE, MAE, MAPE, MASE, RMSSE)})
## # A tibble: 4 × 7
##   .model         .type  RMSE   MAE  MAPE  MASE RMSSE
##   <chr>          <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Drift          Test   5.58  4.24  15.3 0.596 0.618
## 2 Mean           Test  10.3   7.73  30.4 1.09  1.14 
## 3 Naive          Test   5.54  4.21  15.2 0.590 0.613
## 4 Seasonal_naive Test   9.01  7.08  26.5 0.995 0.998

5. Exponential smoothing

From the previous plots and models it’s clear that the data exhibits high seasonality but does not have a clear trend, upwards or down sustained in time, only seasonal variability. Therefore, the best model would be one that includes seasonality and possibly no trend. From what we’ve observed from plots the prices are heavily dependent on the season and the previous prices, therefore it’s expected that alpha will be close to 1 to weight more the most recent observations.

fit_my_ets <- potato_constant_prices |>
  model(my_ets = ETS(Average_constant_prices ~ error("A") + trend("N") + season("A")))

As expected the alpha is close to 1 meaning that the most recent observations are weighted heavily.

fit_my_ets |> 
  tidy()
## # A tibble: 15 × 4
##    Commodity  .model term    estimate
##    <chr>      <chr>  <chr>      <dbl>
##  1 Potato Red my_ets alpha   0.997   
##  2 Potato Red my_ets gamma   0.000129
##  3 Potato Red my_ets l[0]   28.1     
##  4 Potato Red my_ets s[0]    2.31    
##  5 Potato Red my_ets s[-1]  -2.56    
##  6 Potato Red my_ets s[-2]  -6.07    
##  7 Potato Red my_ets s[-3]  -6.91    
##  8 Potato Red my_ets s[-4]  -9.70    
##  9 Potato Red my_ets s[-5]  -9.40    
## 10 Potato Red my_ets s[-6]  -3.81    
## 11 Potato Red my_ets s[-7]   2.36    
## 12 Potato Red my_ets s[-8]   8.48    
## 13 Potato Red my_ets s[-9]  12.9     
## 14 Potato Red my_ets s[-10]  7.96    
## 15 Potato Red my_ets s[-11]  4.46

The model appears to leave white noise residuals, and exhibits a clear and constant seasonality, prices increasing towards the end of the year.

fit_my_ets |>
  components() |>
  autoplot()

The model appears to adjust better than the other models but still exhibits some autocorrelation in the errors, but they appear to be normally distributed with mean 0.

fit_my_ets |>
  gg_tsresiduals()

fit_ets <- potato_constant_prices |>
  model(ets = ETS())
## Model not specified, defaulting to automatic modelling of the
## `Average_constant_prices` variable. Override this using the model formula.

The automatic model estimated with fpp3 seems to be almost the same model as we tought, without a trend, but with some adjustments in the parameteres.

fit_ets |> 
  tidy()
## # A tibble: 15 × 4
##    Commodity  .model term    estimate
##    <chr>      <chr>  <chr>      <dbl>
##  1 Potato Red ets    alpha   1.00    
##  2 Potato Red ets    gamma   0.000101
##  3 Potato Red ets    l[0]   25.1     
##  4 Potato Red ets    s[0]    1.04    
##  5 Potato Red ets    s[-1]   0.885   
##  6 Potato Red ets    s[-2]   0.777   
##  7 Potato Red ets    s[-3]   0.741   
##  8 Potato Red ets    s[-4]   0.663   
##  9 Potato Red ets    s[-5]   0.677   
## 10 Potato Red ets    s[-6]   0.867   
## 11 Potato Red ets    s[-7]   1.11    
## 12 Potato Red ets    s[-8]   1.34    
## 13 Potato Red ets    s[-9]   1.44    
## 14 Potato Red ets    s[-10]  1.30    
## 15 Potato Red ets    s[-11]  1.16

The seasonality appears to be the same as our model being constant all over the observed periods, increasing the prices towards the end of the year and the residuals appears to be white noise.

fit_ets |>
  components() |>
  autoplot()

This model fits better the data, the residuals besides 1 does not exhibit autocorrelation and the errors appears to be normaly distributed with 0 mean.

fit_ets |>
  gg_tsresiduals()

The p-value of the Ljung-Box test of the potato series is 0.29, therefore we fail to reject the null hypothesis, suggesting that there is no significant evidence of autocorrelation in the residuals.

augment(fit_ets) %>%
  features(.resid, ljung_box, lag=24, dof=0)
## # A tibble: 1 × 4
##   Commodity  .model lb_stat lb_pvalue
##   <chr>      <chr>    <dbl>     <dbl>
## 1 Potato Red ets       27.3     0.289

The tables show the performance of the exponential smoothing model compared to the 4 most basic models, the first table is forecasting a 12 month period and the second table forecasting only 1 month. As seen by the tables the ETS model performs much better in both forecasting than the simple methods, in almost every single metric it outperforms the simpler methods, but also the residuals of the ETS model appears to be white noise.

fit_cv <- potato_constant_prices |>
  stretch_tsibble(.init = 18, .step = 1) |>
  model(
    Seasonal_naive = SNAIVE(Average_constant_prices),
    Naive = NAIVE(Average_constant_prices),
    Drift = RW(Average_constant_prices ~ drift()),
    Mean = MEAN(Average_constant_prices),
    ETS = ETS())
## Model not specified, defaulting to automatic modelling of the
## `Average_constant_prices` variable. Override this using the model formula.
fc_cv <- fit_cv %>%
  forecast(h=12)

suppressMessages({fc_cv %>%
  accuracy(potato_constant_prices) %>%
  select(.model, .type, RMSE, MAE, MAPE, MASE, RMSSE)})
## # A tibble: 5 × 7
##   .model         .type  RMSE   MAE  MAPE  MASE RMSSE
##   <chr>          <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Drift          Test  13.3  10.2   38.4 1.43  1.47 
## 2 ETS            Test   8.31  6.47  24.7 0.909 0.920
## 3 Mean           Test  10.4   7.64  28.6 1.07  1.15 
## 4 Naive          Test  12.5   9.40  35.4 1.32  1.39 
## 5 Seasonal_naive Test   8.71  6.80  24.4 0.955 0.965
fc_cv <- fit_cv %>%
  forecast(h=1)

suppressMessages({fc_cv %>%
  accuracy(potato_constant_prices) %>%
  select(.model, .type, RMSE, MAE, MAPE, MASE, RMSSE)})
## # A tibble: 5 × 7
##   .model         .type  RMSE   MAE  MAPE  MASE RMSSE
##   <chr>          <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Drift          Test   5.58  4.24  15.3 0.596 0.618
## 2 ETS            Test   4.22  3.23  11.6 0.453 0.468
## 3 Mean           Test  10.3   7.73  30.4 1.09  1.14 
## 4 Naive          Test   5.54  4.21  15.2 0.590 0.613
## 5 Seasonal_naive Test   9.01  7.08  26.5 0.995 0.998

6. ARIMA

The best auto ARIMA model is ARIMA(2,0,0)(1,1,0). Let’s break down what this implies: 1. Auto regressive (AR) terms (p): 2 auto regressive terms, this is expected since the exponential smoothing model revealed that the prices are highly dependent on the previous values (weighted more). 2. Difference (d): 0, this implies that the series appears to be stationary and that no differencing is needed to make the series stationary, this is unusual of economic series, where most of them are Integrated of order 1. 3. Moving Average (MA) terms (q): is zero, meaning that there is no need to include past errors to forecast the series. 4. Seasonal Autoregressive (SAR) terms (P): the P is 1, meaning that the current observation is modeled as a linear combination of the previous observation from the same season, we concluded by plotting the series that there is a high seasonal component, specially towards the end of the year where the prices tend to increase. 5. Seasonal Differencing (D): this is almost equivalent to the number 2 to difference the series, in this case the season is differenced to make it stationary because of the strong seasonal component of the series. 6. Seasonal Moving Average (SMA) terms (Q):also equivalent to the corresponding moving average term, in this series is zero. meaning that there is no need to include past errors to forecast the series. 7. Seasonal Period ([s]): this is the final component which is 12, meaning that the granularity of the data is monthly and the seasonal component is repeated every year.

fit_arima <- potato_constant_prices |>
  model(ets = ARIMA()) |>
  report()
## Model not specified, defaulting to automatic modelling of the
## `Average_constant_prices` variable. Override this using the model formula.
## Series: Average_constant_prices 
## Model: ARIMA(2,0,0)(1,1,0)[12] 
## 
## Coefficients:
##          ar1      ar2     sar1
##       1.0756  -0.2719  -0.4160
## s.e.  0.1131   0.1113   0.1198
## 
## sigma^2 estimated as 15.56:  log likelihood=-229.19
## AIC=466.38   AICc=466.9   BIC=476.01

The residuals of the auto arima appears to be white noise, it does not exhibit autocorrelation on the errors and they appear to be normally distributed around a zero mean.

fit_arima |>
  gg_tsresiduals()

The following table compares the ARIMA model with the previous models: exponential smoothing, seasonal naive, naive, drift and mean. The 4 most simple models does not provide the best estimate and moreover the residuals are not white noise. Between ARIMA and ETS the decision becomes narrower, both perform good and the residuals resemble white noise. However, the ETS models seems to perform better on both 1 month forecast and 12 months forecast in almost every metric. Therefore, we conclude that the best model to forecast the Potato Prices is the ETS.

fit_cv <- potato_constant_prices |>
  stretch_tsibble(.init = 18, .step = 1) |>
  model(
    Seasonal_naive = SNAIVE(Average_constant_prices),
    Naive = NAIVE(Average_constant_prices),
    Drift = RW(Average_constant_prices ~ drift()),
    Mean = MEAN(Average_constant_prices),
    ETS = ETS(),
    Arima = ARIMA())
## Model not specified, defaulting to automatic modelling of the `Average_constant_prices` variable. Override this using the model formula.
## Model not specified, defaulting to automatic modelling of the `Average_constant_prices` variable. Override this using the model formula.
fc_cv <- fit_cv %>%
  forecast(h=12)

suppressMessages({fc_cv %>%
  accuracy(potato_constant_prices) %>%
  select(.model, .type, RMSE, MAE, MAPE, MASE, RMSSE)})
## # A tibble: 6 × 7
##   .model         .type  RMSE   MAE  MAPE  MASE RMSSE
##   <chr>          <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Arima          Test   8.90  6.28  22.6 0.882 0.986
## 2 Drift          Test  13.3  10.2   38.4 1.43  1.47 
## 3 ETS            Test   8.31  6.47  24.7 0.909 0.920
## 4 Mean           Test  10.4   7.64  28.6 1.07  1.15 
## 5 Naive          Test  12.5   9.40  35.4 1.32  1.39 
## 6 Seasonal_naive Test   8.71  6.80  24.4 0.955 0.965
fc_cv <- fit_cv %>%
  forecast(h=1)

suppressMessages({fc_cv %>%
  accuracy(potato_constant_prices) %>%
  select(.model, .type, RMSE, MAE, MAPE, MASE, RMSSE)})
## # A tibble: 6 × 7
##   .model         .type  RMSE   MAE  MAPE  MASE RMSSE
##   <chr>          <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Arima          Test   4.58  3.47  12.7 0.487 0.507
## 2 Drift          Test   5.58  4.24  15.3 0.596 0.618
## 3 ETS            Test   4.22  3.23  11.6 0.453 0.468
## 4 Mean           Test  10.3   7.73  30.4 1.09  1.14 
## 5 Naive          Test   5.54  4.21  15.2 0.590 0.613
## 6 Seasonal_naive Test   9.01  7.08  26.5 0.995 0.998

7. Other forecasting methods

For other forecasting methods this time series will be forecasted with Prophet, developed by Taylor and Letham (2017) at Meta Platforms, Inc. Formerly named Facebook, Inc. The code is implemented following the Quickstart R API retrived from https://facebook.github.io/prophet/docs/quick_start.html#r-api.

Prophet requires two columns, “ds” to be a date column and “y” to be a numeric column to forecast.

df <- potato_constant_prices |> as_tibble() |>
  mutate(ds = Date,
         y = Average_constant_prices) |>
  select(ds, y)

The prophet function allows to calculate a forecast of the series.

m <- prophet(df, seasonality.mode = 'multiplicative')
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

The following table shows the last values of the forecast made with prophet, as seen even if the data is monthly prophet will make a daily forecast.

future <- make_future_dataframe(m, periods = 365)
forecast <- predict(m, future)
tail(forecast[c('ds', 'yhat', 'yhat_lower', 'yhat_upper')])
##             ds     yhat yhat_lower yhat_upper
## 454 2022-04-26 33.26628   27.03555   39.51099
## 455 2022-04-27 31.47488   25.51133   37.61930
## 456 2022-04-28 30.28656   23.77906   36.64366
## 457 2022-04-29 29.76002   23.26951   36.12738
## 458 2022-04-30 29.93481   23.55781   35.81287
## 459 2022-05-01 30.83011   24.61078   36.93198

The following plot shows the actual vs fitted values and a forecast of 1 year. The model adjusts good to the actual values. However, the forecast doesn’t look good, it does not follow well the seasonal pattern of the data, with higher prices at the end of the year, it just follows a noise and does not seem plausible.

plot(m, forecast)

The components of the forecast can be seen in the following plot, the trend seems to decrease from 2014 to 2017 and then increase sharply, meanwhile the seasonality appears to be extremely ciclycal, and almost constant across the years, which was not seen in the previus plots and does not seem plausible.

prophet_plot_components(m, forecast)

Finally, the RMSE is calculated using the same rolling window used for the fable models and 1 month forecast. since prophet does not provide an easier way to calculate this compared to fpp3 the code is a little verbose. The RMSE is 7.16 which is higher than ETS and ARIMA for 1 month forecast, as seen this method is worse than ETS, ARIMA, Drift and Naive as measured by RMSE, this was expected by looking at the plots, the model fails to capture seasonality and does not provide a good forecast compared to ETS and ARIMA.

init_obs <- 18

# Define the number of folds
n_folds <- nrow(df) - init_obs + 1

prophet_rolling_cv <- function(df, init_obs, n_folds) {
  cv_results <- numeric(n_folds)
  
  for (i in 1:n_folds) {
    train <- df[1:(init_obs + i - 1), ]
    test <- df[(init_obs + i):(init_obs + i), ]
    
    # Fit Prophet model on training data
    suppressMessages({ m <- prophet(train) })
    
    suppressMessages({ future <- make_future_dataframe(m, periods = nrow(test), freq = "month") })
    forecast <- predict(m, future)
    
    # Calculate RMSE
    suppressMessages({ cv_results[i] <- sqrt(mean((forecast$yhat[-(1:nrow(train))] - test$y)^2)) })
  }
  
  suppressMessages({ return(mean(cv_results, na.rm = TRUE)) })
}

suppressMessages({ cv_rmse <- prophet_rolling_cv(df, init_obs, n_folds) })

print(paste0("Prophet RMSE:",cv_rmse))
## [1] "Prophet RMSE:7.16472563408317"